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Abstract 

In this paper, we describe a novel application of sigma-point methods 
to continuous-discrete filtering. In principle, the nonlinear continuous- 
discrete filtering problem can be solved exactly. In practice, the solution 
contains terms that are computationally intractible. Assumed density 
filtering methods attempt to match statistics of the filtering distribution 
to some set of more tractible probability distributions. We describe a 
novel method that decomposes the Brownian motion driving the signal in 
a generalised Fourier series, which is truncated after a number of terms. 
This approximation to Brownian can be described using a relatively small 
number of Fourier coefficients, and allows us to compute statistics of the 
filtering distribution with a single application of a sigma-point method. 

Assumed density filters that exist in the literature usually rely on 
discretisation of the signal dynamics followed by iterated application of a 
sigma point transform (or a limiting case thereof). Iterating the transform 
in this manner can lead to loss of information about the filtering distri- 
bution in highly nonlinear settings. We demonstrate that our method is 
better equipped to cope with such problems. 

1 Introduction 

Stochastic differential equations (SDEs) provide a natural way to describe the 
evolution of systems that are inherently noisy or contain unknovirn subphenom- 
ena that can be modeled as stochastic processes [l][2]. Let's say that the evo- 
lution of an idealized system could be modeled with the ordinary differential 
equation (ODE) 
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where Xf G M" is the state of the system. Roughly speaking, to construct an 
SDE, one adds a 'white' driving noise to the dynamics of an ODE. From the 
modeUing perspective, the purpose of the noise is to capture deviations from 
the ideal deterministic model. The amplitude of this driving noise may poten- 
tially depend on the current state Xt of the system. The result is a differential 
equation 

^ = a{Xt) + B{Xt)Wu (2) 

where Wt € M'' is Gaussian white noise. Because of the highly irregular nature of 
continuous-time white noise, one needs to be careful when defining this equation 
mathematically. In order to do this, the usual approach is re-write ^ as an 
integral equation and interpret the second term on the right as an Ito stochastic 
integral [2)[3]: 

Xt=xo+ [ a{Xu)du + [ B{Xu)dWu. (3) 
Jo Jo 

This allows us to interpret the dynamics as an Ito stochastic differential 
equation: 

dXt = a{Xt)dt + B{Xt)dWu Xo = xq. (4) 

The solution Xt will then be an ltd diffusion process. Here, the term dWt denotes 
the infinitesimal change in a Brownian motion Wt € M'' at time t. We assume 

is a standard Brownian motion, so that its components are independent with 
variance t at time t. One must make some assumptions on a and B to ensure 
equation Q has a unique solution. If both functions are globally Lipschitz and 
grow at most linearly, one is assured that this will be the case |3 . 

It is often the case that one cannot observe the process Xt directly — instead, 
one must rely on time-discrete, noisy observations {Yt^ £ M'*}i>i of the process. 
In mathematical terms, the model for measurements of this type can often be 
written as 

Yt^=hiXtJ + Vt^, (5) 

with Gaussian measurement noises Vt- ^ N{Q,Ri). One is then often faced 
with the task of computing the expectation E[(f>{Xt)\Yti , ■ ■ ■ , Yt^], where t > tn 
for some given function (f). This is known as the continuous- discrete filtering 
problem. For simplicity, we assume that the conditional distribution of Xt has a 
density with respect to the Lebesgue measure. For filtering problems where this 
is not the case, such as when part of the system is observed without error, much 
of our analysis can be applied with only minor modifications. The estimation 
problem can be solved for arbitrary provided that we can compute the filtering 
density pxt {x \{Yt^ : tk < t}) for all t. This latter approach is often called the 
probabilistic or Bayesian approach to the filtering problem fl]. 
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It is only in a small number of special cases that the conditional distribution 
of Xt can be described using a finite number of parameters [4| — when the SDE is 
linear and the function h in the measurement model is linear, then the Kalman 
filter can be used to compute the exact solution [Hj. In all the other cases one 
must approximate the filtering distribution in some manner: for example by 
discretising the signal and employing a particle filter [^jsj, which uses Monte 
Carlo samples to approximate the filtering distribution. Another general way 
is to take a parametric set of tractable densities (for example a set of densities 
within the exponential family) and find the density within that set that most 
closely matches the filtering density. This approach, introduced in fo' , is known 
as assumed density filtering. 

In this paper, we will attempt to compute statistics of the Gaussian dis- 
tribution that most closely matches the filtering distribution. This particular 



special case of assumed density filtering is known as Gaussian filtering 10 
There are a number of ways to approach the problem. The extended Kalman 
filter (EKF) [l] uses a Taylor series approximation to the non-linearities in SDE 



and measurement model. The unscented Kalman filter (UKF) |11 13 



uses a 



set of sigma-points for computing the mean and covariance of the Gaussian ap- 



proximation. Quadrature and cubature based filters [10| 14J- 16] use Gaussian 
numerical integration for computing the mean and covariance. 

The commonly used approaches to filtering in continuous-discrete systems 
can be divided into two categories: (a) we first discretize the SDE using methods 
such as Ito-Taylor series or a stochastic Runge-Kutta discretisation [Tt], [s], 
and then use discrete-time filtering algorithms or (b) we form an approximate 
filter that operates in continuous time and then we discretize that. The relative 
metrits of these approaches were recently studied in [18| . 

In the current paper, we take a different approach. Recall that if we fix a 
probability space (Q,J-t,P), then for each fixed uj € fl we can consider Brownian 
motion as a function t Wtito). When this function is fed into the SDE, the 
solution at time tk given the solution at time tk-i, Xt^_-^ = xt^_-^, can be 
considered as the transition map Xt^{uj) — /(xf^ ^ , w), where w G f2. We can 
think of the transition map as a functional of i — > Wt(w). That is, Xt^{uj) = 

F[xt,_,,{Wtito):te[tk^i,tk]}]- 

One can view the Brownian motion Wt as a random element of the Hilbert 

space L'^[0,T]. It is an inherently infinite-dimensional object. However, one 
can construct a finite-dimensional approximation of Wt by projecting it onto 



a finite-dimensional subspace of L^[0,T] 19 . We use the projection as the 
driving noise in an approximation of the original signal. The transition map 
is then approximated as a function = /(xt^ ^ , t, zi, . . . , zn) that satisfies a 
certain ordinary differential equation. 

Ideas of this type were first explored by Wong and Zakai 20 . Similar ideas 



have been explored in 21 , 22 in the context of variance reduction for Monte- 
Carlo simulation, and in 23 in the context of parameter estimation. In this 
framework, one can interpret Xt as the image of an A'^-dimensional standard 
normal distribution under a nonlinear transform. This suggests the possibility 
of using sigma-point methods such as the unscented transform to construct a 
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Gaussian filter. 

Gaussian filters that currently exist in the literature typically rely on dis- 
cretisation of the signal. The time-i distribution of the discretised signal is 
repeatedly projected onto the set of Gaussian distributions (through moment 
matching or otherwise). Our methodology avoids repeated projection onto the 
space of Gaussian random variables. For this reason we expect improved per- 
formance over existing methods for significantly nonlinear inference problems. 

Our paper is structured as follows. In Section two, we describe our model 
of the filtering problem and briefly review some methods that are used in the 
literature at present. In Section three, we describe our method of approximat- 
ing the time-t marginal distribution of a diffusion process, and we show how the 
approximation can be exploited to construct a novel Gaussian filter. The accu- 
racy of this approximation is investigated in Section four, and in Section five 
we show that our filter performs well on a high-dimensional nonlinear problem. 



2 Gaussian Filtering 

2.1 Sigma point approximations 

Modern Gaussian filtering methods rely on so-called 'sigma point' approxima- 
tions, perhaps the best known of which is the unscented transform ll|12j . Given 



a random variable X and a function /, we wish to approximate the distribution 
of f{X). In order to accomplish this, one chooses a small number of points {ai} 
that represent the distribution of X in some sense. 

We will restrict out exposition to the case where X has an rt-dimensional 
multivariate normal distribution, and we wish to fit a multivariate normal dis- 
tribution to f{X). Suppose X mean m and covariance P. The unscented trans- 
form uses 2n + 1 sigma points, which are constructed as follows. One chooses 
two tuning parameters a and k, then sets X ^ a'^ (n + k) — n. The sigma points 
are then defined by the following expressions: 

a° = m, (6) 
a' = m+ {^/{n + X)P)„, 1 < i < n, (7) 
= m - (v/(n + A)P)„, 1 < i < n. (8) 

The sigma points are determined once one chooses an appropriate matrix 
square root. Here (\/P)*i is the i-th column of the matrix square root of P 

defined via P = \fP\fP ■ 

The mean and covariance of /(X) are approximated by a weighted average 
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of the sigma-point images. Define 3^,; = f{a^), and set 

E[f]^Y.^'^-^y^ (9) 

i=0 
2n 

E[{f E[mf ^ ^"f'^ (y^ - ^) (y^ ~ a^)^ (lo) 

i=0 
2n 

- m)(/ - Eif]^] J2 {^'' - ™) (y^ ~ m)^ ■ (11) 

1=0 

The weights depend on a third tuning parameter /3, and are given by 



w, 



(m) 




A 

n + A' 



4'^ = ^ + (l-«' + ^), 

72 -t~ A 



imi 1 . ^ 

.;-=^^ ^^l,...,2n. (12) 



(m) 



It is well known that the unscented transform matches the mean of ,f{X) exactly 
when / is a polynomial of degree three or less. In general, errors in the estimate 
of the mean are introduced only by the fourth and higher terms in the Taylor 



expansion of / 24 



2.2 Sigma point Kalman filters for diffusion processes 

In the Gaussian filtering paradigm, of which the unscented Kalman filter (UKF) 
is a special case, the filtering problem is reduced to computation of the condi- 
tional mean and covariance of the filtering distribution: 

mt = E[Xt\{Yt,:tk<t}] (13) 

and 

Pt = Coy [Xt\{Yt,:tu<t}]. (14) 

It is usually necessary to approximate the conditional mean and covariance: 
for a general nonlinear diffusion, the moments are only known in terms of the 
solution of a partial differential equation known as the Fokker-Planck equa- 
tion. In dimensions higher than three, the Fokker-Planck equation is typically 
numerically intractible. 

The simplest application of the UKF to a diffusion relies on discretisation 
of the process X. Suppose that at time ti-i we have an estimate of mt-^-^ and 
Pti-i- Our aim is to compute an estimate of rut and Pt the instant before the 
next observation arrives. Once the observation arrives, we perform a Bayesian 
update of the moments. 
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We divide the time interval [U^i, ti] into a number of sub- intervals of length 
At (for clarity, we will discuss the interval [0,ti] here). We then approximate 
the SDE Q on the grid {X^t, X2At, ■ ■ ■} via the relation 

X{k+l)At — fi^kAt, Zk), (15) 

where Zq, Zi, . . . is a suitable sequence of Gaussian random variables. Here, / 
is a transition function that depends on the method of discretisation, and Zk 
is typically draw from a spherical Gaussian distribution of dimension d. For 
example, in the Euler-Maruyama scheme [17| , 

f{XkAu Zk) ^ XkAt + a{XkAt)At + B{XkAt)^tZk, (16) 

where Zk = W^(fe+i)A - WkA In this sense, X(^k+i)At is the image of {XkAt, Zk) 
under a nonlinear transform /. Given a Gaussian approximation to XkAt, one 
can apply the unscented transform to / to find a Gaussian approximation of 
X(k+i)At- One proceeds iteratively until ti, at which point one must also update 
the predictive distribution. Instead of the Euler-Maruyama method, one can 
also use higher order Ito-Taylor expansions, stochastic Runge-Kutta methods 



or various other methods 17 



If t = ti, we will also make a new observation. We must then update our 
predictive distribution. Let m^. and be the mean and covariance of the 
predictive distribution immediately before the new observation arrives. Suppose 
our our estimate of the signal is Gaussian X^ ^ Af{m^.,P['). In Gaussian 
filtering, we compute statistics of the Gaussian approximation to the filtering 
distribution as follows: 

/i. = E[h{XiJ] 

S, = E[{hiX^J - ^ii){h{X^^) - ^,)^] + 
C, = E[{X--m-){h{X-)-^i,)^] 

ITT-ti = "1*^ + Ki{Yt^ - fj,i) 

Pu - p*; - kakJ, (17) 

The updated distribution has mean mf . and covariance Pf . . When the observa- 
tion function h is nonlinear, one can apply the unscented transform to h{X^.) to 



compute an approximation of ^i. Si, and Ci in (17). More complex update rules 
that have been tuned for numerical stability are also known in the literature [25] . 

Alternatively, instead of iteratively applying the unscented transform at the 
prediction, one can take a limit as At — 0, in which case one recovers a system of 
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differential equations for the predictive mean and covariance (see, e.g., 13 16]): 
drru 



dt 



E[a{X^)] (18) 



rip- 

-^=E[a{Xn{Xr-miy] (19) 

+ E[{X- -mi)a'{X-)] (20) 

+ E[B{X-)B'^iX-)], (21) 

where the expectations are taken with respect to the Gaussian distribution 

xr~AA(mr,pr)- 

On the other hand, in this paper we compute the predictive mean and vari- 
ance by constructing a function / so that Xt /(t, xq, Zi, . . . , Zjv), where the 
random variables Zi follow a standard Normal distribution. We can apply sigma 
point methods to / to estimate the mean and variance of Xt . This methodology 
allows us to avoid discretisation of the SDE, taking one large 'step' to compute 
the predictive moments instead of a large number of small ones. 



3 Sigma Point Filtering via Smooth Approxima- 
tions of Stochastic Differential Equations 

3.1 Series expansions of Brownian motion 

We now describe a method for obtaining a smooth approximation of Brownian 
motion by decomposing it in a generalised Fourier series. We aim to use the 
smooth approximation as a driving function in a differenial equation. This will 
enable us to approximate a nonlinear stochastic differential equation with a ran- 
domised ordinary differential equation, which will prove to be computationally 
tractable to work with. This approximation was used as the basis of a Markov 
chain Monte Carlo algorithm for Bayesian parameter estimation of a nonlinear 
diffusion in [23] . 

Suppose is a standard Brownian motion, and let {(f>i}i>i be an orthonor- 
mal basis of i^([0, T]\IR). We use the notation / to denote the indicator func- 
tion, which satisfies /{[o_t]}(u) = 1 when Q <u <t, and is equal to otherwise. 
One can construct a series expansion of W in terms of the basis functions 



as follows 26 







J2 Mn)dwA Mn)du. (22) 
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We use the standard inner product on L^[0, T], which is defined as 



if, 9) 



f{u)g{u)du. 



(23) 



The stochastic integrals are i.i.d standard normal since the basis functions are 
deterministic, and 

E I Mu)dWu =0, (24) 



E Mu)dWuj ^ 

(j>i{u)(f)j{u)du = (5.y. 
For ease of notation, we set 



(t)j{u)dWu 



(j>i{u)dWu 



(25) 



(26) 



If 14^ is a d-dimensional Brownian motion, then by an analagous construction, 
one has 



(27) 



where the random variables Zi are now d-dimensional standard normal. We can 
obtain an approximation of a Brownian sample path by drawing i.i.d samples 
Zi from a standard normal distribution and truncating the sum in (27). This 
allows us to describe a Brownian sample path approximately in terms of a finite 
number of variates. This representation is crucial for our implementation of 
sigma-point inference methods. 



3.2 Series Approximation of SDE 



In order to approximate the diffusion X, we truncate the series expansion (27), 
and use the resulting smooth process as an approximation of Brownian motion. 
We replace the stochastic integral in equation ([s]) with the time derivative of 
the truncated process: 



Xt=Xo + 



t ^ N 

a{Xu)du 



f B{Xu)Z, 



4>i(u)du. 



(28) 



Since X is driven by a finite linear combination of basis functions, the resulting 
process is differentiable. We can therefore interpret X as the solution to an 
ordinary differential equation with a random driving function. 

N 



dXt 
dt 



i{Xt) + J2B{Xt)Z,Mt): X, 



= Xq. 



(29) 



i=l 
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Approximations of this type were first investigated by Wong and Zakai 20 



who showed that in the one-dimensional case, Xt converges to the Stratonovich 
solution of the stochastic differential equation [Tt] . Recall that a Stratonovich 
SDE 

dXt = a{Xt)dt + B{Xt) o dWt (30) 



can be converted to an Ito diffusion and vice versa using the relationship 



f B{X,) 
Jo 



dWt = / B{Xt)dWt 



c{Xt)dt, 



(31) 



where the integral on the left is in the Stratonovich sense, and the i-th compo- 
nent of the vector c satisfies 



c\x) 



1 " a rji.k 



j=i k=i 



(32) 



In other words, the Stratonovich solution of an SDE is equivalent to the Ito 
solution with a modified drift. 

The multidimensional setting is somewhat more involved. In general, if 
{W„} is a sequence of piecewise smooth processes converging to a Brownian 
motion, one cannot guarantee {W^} ~^ W implies that the sequence of ap- 
proximate differential equations converges to the Stratonovich solution of the 
SDE. One must impose some extra conditions on the so-called 'Levy area' of 
the Brownian approximations. Let „ be the j-th component of Wn at time 
It, and let be the j-th component of W at time u. Define a set of processes 



qi-3 



) dW 



Sijt. 



(33) 



Many results about the convergence issue are known in the mathematical lit- 
erature. For example, suppose the following conditions hold with probability 1 
for all K less than some positive number 7: 



sup\\Wu-Wn.u\\=0{n-^), 

u<T 



sup 11^;^ J 

u<T 







f 




Jo 


du 



du = 0{\og\n)) V(5>0. 



(34) 
(35) 

(36) 



The thesis of Schmatkov 27 showed that under these assumptions. 



sup|iX„-X„|| =0(n-«). 

u<T 



(37) 



See 28 for an analogous result about stochastic partial differential equations. 



In general, there is no guarantee that the sequence of partial sums in (271 
converges uniformly (so (34) is not necessarily satisfied). If one chooses the 
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Haar wavelets as an orthonormal basis in which to expand the driving Brownian 
motion, then convergence is indeed uniform: in fact this choice corresponds to 
the Levy-Ciesielski construction of Brownian motion [29]. For a general choice 
of basis functions, one can show that X X provided that the processes are 
interpreted as rough paths (SOl . 



The approximation ( 29 ) has the advantage of re-casting an infinite dimen- 
sional problem in finite-dimensional terms. We can view the solution of ( |29[ ) as 
a function 

Xt = X{t, xo, 2^(1, 1) . . . , Z(d,Af))- (38) 

In essence, the time-i distribution of the process X can be interpreted as the 
image of a Gaussian distribution under a nonlinear transform. This is precisely 
the setting for which sigma-point methods were designed. 

3.3 The proposed series expansion filter 

Our algorithm proceeds as follows. We assume we have a Gaussian approxi- 
mation Af{mt^i, Pt-i) to the filtering distribution at time ti-i. We wish to 
compute the filtering distribution at time t. lft<ti, we compute the predictive 
distribution. If t = U, we must also update the predictive distribution with the 
information gained from our observation . . 

We choose a set {tr^ } of sigma points to represent the joint distribution of the 



state and the random coefficients Zi in (29). Each sigma point can be thought 
of as a vector of dimension n + d x N, 

a^^{al,al). (39) 

Here, the first n elements cr^ of the vector cr* are the sigma points for 



the initial condition for the ODE (291, that is the sigma points generated by 
Af{mt^i, Pt-i). The remaining d x N elements al are the sigma points for the 
coefficients of an A^-term expansion of a d-dimensional Brownian motion. To- 
gether, these data determine an initial value problem. For each sigma point cr*. 



we solve the ordinary differential equation ( 29 ) with initial condition Xt-_-^ = cr 



At time T, the solution is an ri-dimcnsional vector 

X^r^X{T,al,al). (40) 

We treat the solution at time T of the initial value problem as the image 
of the sigma point (t\ The set of vectors {Xj:} can be thought of as a discrete 
approximation to the filtering distribution. We can use these vectors to compute 
an estimate of rrij and Pt , though the specific computation depends on the choice 
of sigma-point method. This methodology is in marked contrast to the sigma 



point Kalman filters of Section 2.2 These rely on discretisation of the signal 
dynamics and sigma point approximation of the Brownian increment Wt+At—Wt 
at each timestep or a limiting case of this discretisation as At — >■ 0. 
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4 Approximation error 



We now present an analysis of the error induced by the series expansion approx- 
imation. We will see that it is possible to set the approximation error to at a 
fixed time t in this setting when working with a linear time-varying system. 

Analysis of the general case is considerably more difficult. One cannot easily 
exploit the usual tools from the theory of stochastic processes. The truncated 
driving noise does not possess the Markov property, nor is it a martingale. The 
truncated process is, however, a Gaussian process, and this structure is exploited 
in 30 to demonstrate convergence to the true SDE. In the later part of this 



section we present some numerical investigations in the nonlinear setting. 



4.1 Analysis of the linear case 

The multivariate linear SDE has dynamics given by 

dXt = A{t)Xtdt + BdWt, Xq^xo, 



(41) 



By a change of coordinates, we assume without loss of generality that Xq = 0. 
The solution of this SDE is 

(42) 



Xt = *(i)xo + *(i) / ^-\u)BdWu, 
Jo 



where 5* solves the homogenous ODE 
d 



dt 



(43) 



We fix an orthornormal basis {0^} of L^[0,T], and let denote the matrix 
(j)ild- The approximation X is then 

Xt = *(t)xo + ^ ^{t) (^j^ m-\u)B^,{u)di^ Z,. (44) 

The approximation error at time t is 
el{N) = E[\\Xt-Xtr] 

= E 



*(<) / ■^-^{u)BdWu 
Jo 

^ ^-\u)B^,{u)dv^ Z, 



2n 



(45) 



To simplify this expression, recall that for matrix-valued functions G and iJ, 



E 



t X T 

G{u)dW,^ 



H{u)dWu 



= / {G{u),H{u))Fdu, 
Jo 
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where {G{u), H{u)) p is the Frobenius inner product defined by 



{Giu),H{u))F = ^^G.,Hi?.,(u). (46) 

j 

Now, using the fact that Zi = J ^i{u)dWu, we can conclude that the error 
can be written in the form 

oo „t 

^t{N)= E / \\^m-\u)B^,{u)\\ldu. (47) 

i=N+l 

The only question that remains is how to choose the basis functions {4>i} to 
minimise the error. At a given time t, the approximation will be exact if we can 
find a set of matrices such that 'i'^^ {u)B^i(u)du = ioi i > N. 

Let [^^^{■)B]^,k denote the fc-th column of the matrix ^'^^(•)_B. We require 
that for 1 < fc < n 

/ [^-\u)B]^J,iu)du^O yi>N. (48) 
Jo 

In other words, when i > N, (f>i must be orthogonal (in L'^[0,T]) to the 
linear span of the functions {'i'^^{u)Bii, . . . ,'^~^{u)Bnd}- This will be the 
case if we set N — n and generate the first N basis functions by applying the 
Gram-Schmidt procedure to {4'~"'^(u)i3ii, . . . ^'^~^{u)Bnd] ■ 

The construction we have outlined above gives us a principled way in which 
to choose at most N — n x d basis functions for each Brownian component. 
When the basis functions are chosen as above, the approximation is exact at a 
single time T. In some applications, the diffusion matrix B will be sparse or 
low-rank, which means we will need far fewer than nx d basis function for exact 
simulation. 



It is worth remarking that we cannot expect to compute compute Zi in (261 
exactly given W for most choices of basis functions, since we rely on numerical 
approximation of the integral. Even so, this is a useful result because we can 
sample from a standard normal distribution instead. This allows us to generate 
weak solutions of the SDE. 

4.2 Analysis of the nonlinear case 

In the general nonlinear case, analytic solutions for nonlinear multi-dimensional 
ordinary differential equations are rarely available in closed form. Hence, it 
is difficult to establish precise bounds on the error induced by the series ex- 
pansion approximation. When the system of interest is almost linear, one can 



compute the optimal basis for the linear case as in Section 4.1 However, one 
can still obtain satisfactory results by using an orthonormal basis that is not 
optimal. In this section we aim to investigate properties of the series expansion 
approximation numerically. 
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We will test our approximation on a model of an aircraft turning in the 
(xijXs) plane. We model the motion of the aircraft using noisy dynamics that 
account for imperfections in the control system. The model also accounts for 
external forces such as wind that might affect the trajectory of the aircraft. We 
describe the state of the with a seven-dimensional vector Xi-j. The components 
{xi,X3,X5) represent the position of the aircraft in rectangular cartesian coor- 
dinates, while the components {x2,X4,xg) describe its velocity. The number 2:7 
describes the rate at which the aircraft is turning in the (a;i,X3) plane. 

The dynamics of the system are given by Q, with 



a{xi.,r) 



( ^2 \ 

x^ 

XjX2 

xe 





(49) 



B{xv.7) 
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1/ 



(50) 



Here, 



x'i and Vx 



Nonlinearities arise 



from two sources in this system. Firstly, the state-dependent covariance matrix 
causes the system to deviate from Gaussianity. Second, the random evolution 
of the turn rate x^ causes the aircraft to behave erratically. As the variance 
of grows, the system becomes more nonlinear and more non-Gaussian. A 
similar model was studied in 
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though in that case the diffusion matrix was 
assumed to be constant. Note that the state dependent covariance matrix makes 
Ito-Taylor and Runge-Kutta discretisations difficult to implement. 

In order to test the series expansion approximation, we simulated paths 
from X on the interval [0,8]. We set Xq = (1000,0,2650,150,200,0,6), and 
Cov{Wi,W2,W3,W4){t) = Diag(50,50,50,25)<, resulting in a highly nonlinear 
process. We took 100, 000 simulations from the Euler-Maruyama scheme as 
ground truth, having set At = .005. We simulated 100, 000 paths from the 
series expansion approximation with N = 1,4,6 and 10. The marginal means 
and standard deviations are shown in tables 1 and 2. Figure [T] shows a Q-Q 
plot of the Euler simulation vs. the series expansion simulation with N = 10, 
together with a plot of both densities. 
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Euler 


N = 1 


N = 4 


N = 6 


N = 10 


E[X{\ 


626 


549 


607 


612 


619 


E[X2] 


-59 


-91 


-65 


-63 


-61 


E[X^] 


3588 


3689 


3612 


3603 


3597 


E[X^] 


53 


82 


58 


56 


55 


E[X^] 


200 


200 


200 


200 


200 


E[X^] 

















E[Xj] 


6 


6 


5.9 


6 


5.9 



Table 1: Marginal mean values for Xl■^ as computed by the Euler scheme and 
series expansion approximations 





Euler 


N = 1 


N = 4 


N = 6 


N = 10 


Std(Xi) 


359 


151 


317 


333 


346 


Std(X2) 


90 


61 


86 


88 


89 


Std(X3) 


277 


128 


250 


261 


268 


Std(X4) 


93 


66 


90 


91 


92 


Std(X5) 


29 


17 


27 


28 


28 


Std(X6) 


6.4 


5.7 


6.2 


6.3 


6.3 


Std(X7) 


14.1 


12.8 


13.7 


13.9 


14.0 



Tabic 2: Marginal standard deviations for Xx-'j as computed by the Euler scheme 
and series expansion approximations 
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Figure 1: (Left) Q-Q plot of 100,000 samples from an Euler discretisation of 
vs. 100,000 samples from the series expansion approximation. Linearity of 
the plot suggests the distributions are very similar. (Right) Density plots of 
the samples. Draws from the Euler scheme are plotted using the solid line, and 
draws from the series expansion scheme are represented by the broken line. We 
used the Fourier sine series as a basis, with TV = 10. 

4.3 Choice of basis functions 

As we noted earlier, the solution of a system of nonlinear ODEs cannot usually 
be computed in closed form. For this reason, one cannot typically compute 
the optimal basis in which to expand W. One sensible test of the accuracy 
of a given series expansion method is to sample a large number of SDE paths 
using, for example, the stochastic Runge-Kutta method, and to compare those 
samples against samples from the series expansion method. Unfortunately, this 
effectively reduces the problem of finding a 'good' basis to trial and error. How- 
ever, we can suggest one heuristic method that appears to work well in practice, 
especially in the case of state-independent noise. 

Suppose we have a stochastic differential equation 

dXt = a{Xt)dt + BdWt. (51) 

We can Taylor expand the drift function about a point qi to obtain a local 
approximation 

aix) ^ a{qi) + Jg^ia){x - qi), (52) 

where Jq^ (a) is the Jacobian matrix of a at the point qi . The result is a linear 
approximation of the dynamics of X: 

dXf = a{qi)dt + Jg, {a){X^ ~ qi)dt + BdWu (53) 

and we can use the results of Section |4.1| to compute the optimal basis at time 
T for this approximation, which we denote . . . , (j^^^^}- 

We now choose a collection of points {q2, ■ ■ ■ , qk)- For each point, we linearise 
the SDE as before and solve for the optimal basis. This results in k sets of basis 
functions {((/'fi, . • ■ , (l^nd)}- Each set of basis functions corresponds to a different 
linearisation of the SDE. We can 'combine' these linearisations by applying the 
Gram-Schmidt procedure to the entire collection of functions, yielding nx dx k 
functions in total. 
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As an example of this approximation, we consider the 'double weU' process 



dXt = aXt{-f^ - Xf)dt + BdWt, xo = 3 (54) 

over the interval [0,4]. To test the heuristic method above, we set a = 1,7 = 
2,B — 3. We linearised the drift function about eight points (qi-.s), which 
were spaced uniformly in the interval [0, 1.5]. This gives a set of functions of the 
form exp(39^ — 4). The Gram-Schmidt procedure was applied to these functions. 



and the resulting orthonormal functions were used in the series expansion (22). 
Figure [2] shows a set of density plots of X at time 4 using various orthonormal 
bases, along with a numerical approximation of the solution of the Fokker-Planck 
equation. The heuristic method gives the best performance in this regime, but 
this is not necessarily the case for all parameter settings. 




Figure 2: Density plots of 20,000 samples from the double well process (54) at 
T = 4 using various basis functions, along with the numerical solution of the 
Fokker-Planck equation (solid line). The dashes correspond to Haar wavelets, 
the dots and dashes are from a Fourier sine series expansion, and the dotted 
line corresponds to the linearisation heuristic described above. The heuristic 
method matches the true density most closely. The disparity is less clear over 
shorter timescales. 



5 Filtering Experiments 

As the nonlinearity of the system increases, the speed at which the filtering 
distribution deviates from Gaussianity should also increase. Intuitively, this 
means the amount of information that the conventional UKF 'throws away' 
at each timestep grows with nonlinearity of the system. The series expansion 
method avoids this issue by targeting the predictive density at a given time 
directly without any intermediate discretisation or projection. As a result, we 
should expect the series expansion filter to outperform the conventional UKF 
in systems that are more highly nonlinear. 
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To test this hypothesis, we set the covariance of the four-dimensional Brown- 
ian motion driving the aircraft model to Cov(Wi, W2, W3, W4){t) = Diag(10, .2, .2, Qy^)t. 
The quantity Qw determines the variance of the turn rate of the aircraft. We 
use it as a proxy for the degree of nonlinearity of the system. We chose a num- 
ber of values for Qw^ ranging between Qw = -1 and Qw = 1.1- For each value 
of the variance, we simulated 1400 trajectories for the aircraft, running both 
filters on each trajectory. For each trajectory, the initial condition was drawn 
from a Gaussian distribution with mean mo = (1000, 0, 2650, 150, 200, 0, 6). The 
standard deviation of each component was set to 100, with the exception of the 
standard deviation of x-j which was set to .1, and components were assumed to 
be independent. 

For each trajectory, we simulated 20 observations, spaced T = 8 units of 
time apart. The observation function h models radar signals arriving at a dish. 
For this reason, we assume observations arrive in spherical coordinates, so that 
h is given by 

/ ^xl+ xl + xl \ 
Hxi-j) ^ tan ^(xs/xi) (55) 

\tan-^{x5/^/x[Txl)■/ 

The covariance matrix of the observation noise was set to i? = diag(50, .1, .1). 
For the standard unscented Kalman filter, an Ito- Taylor scheme such as the 



one proposed in 15 is impractical to implement. We used the limiting scheme 



first proposed in 13 . The system of ODEs ( 18 1 was solved by a fourth order 
Runge-Kutta scheme. The number of Runge-Kutta steps used did not appear 
to affect the error appreciably. However, with a large step size the predicted 
covariance can fail to be positive definite, which causes the filter to diverge. We 
found that a good compromise between computational cost and the divergence 
issue was to choose a smaller step-size for more highly nonlinear parameter 
settings. For this reason, we used 200a steps per unit time. 

We set a = 1, K = and (3 = 0. This choice of tuning parameters is 



also known as the cubature Kalman filter 14 15 . Various parameter settings 
produced similar results. 

For the series expansion method, we set 

'^.(t) = /|sin(i^il^) (56) 

(with T = 8), and used iV = 8 basis functions for each Brownian motion. The 
series expansion filter takes one large step instead of many small ones. As such, 
one can expect that the target distribution is less like a Gaussian distribution, 
and one must use care when choosing parameters for the unscented transform. 
We used a = 1,k = —32,(3 = so that A = 7. Choice of basis functions 
made minimal difference in this experiment. This may be because the Gaussian 
approximation and tuning parameters of the unscented transform have a larger 
effect on the filter than specifics of the series expansion approximation. 

For each component ( i.e. position, velocity, turn rate), we computed the 
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root mean squared error 



component 



nk ^ 

■i— 1 component 



(57) 



Here, is the mean of the fihering distribution at time i. We set fc = 3 for the 
position and velocity errors and k — \ for the turn rate error. The inner sum 
runs only over elements of . and that are appropriate to the component. 
For example when computing position error, the sum runs over X\,xi, and x^. 

Both the series expansion filter and unscented filter can diverge and lose 
track of the signal, in which case the error becomes very large. If one attempts 



to average the RMSE (57 1 over a number of runs, a single divergence event 



can dominate the average. This makes it somewhat difficult to compare errors 
between filters. One could impose an arbitrary cutoff and only average over 
errors below the cutoff. This could potentially give misleading results. Instead, 
we compute the errors euKF(i) and esE(i) for each run i, and consider the 
difference between the errors. 

Figure [3] shows the median values of the difference in errors together with the 
first and third interquartiles. The third interquartile corresponding to q = 1.1 
is excluded because the plot could not be scaled appropriately. For the position, 
the value is Q3 = 77m . For the velocity, Q3 = 67m/s, and for the turn rate, 
Q3 — 7.8 degrees/s. 



position errors 



turn rate errors 




Figure 3: The x-axis shows the standard deviation of Xj after one unit of time 
has elapsed. We use this as a measure of the nonlinearity of the system. For 
a range of values of Qw, we simulated 1400 trajectories of the signal, observed 
with noise. We plot median values of the difference in error between the un- 
scented Kalman filter and series expansion Kalman filter (solid line), together 
with the first and third quartiles (dashed lines). Errors were computed seper- 
ately for position, velocity and turn rate of the aircraft. The last point in the 
upper range is omitted because its inclusion would skew the scaling in the image. 
Values for these points can be foimd in section |5] 

Surprisingly, we found that choosing the symmetric square root of Pt instead 
of the Cholesky decomposition improved the accuracy of our algorithm consid- 
erably (though this choice did not improve performance of the standard UKF). 
The choice of matrix square root is known to affect fourth-order and higher 
terms in the Taylor expansion of the transition function /. This is in agreement 
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with our intuition: the transition function in the UKF is locally linear, and 
hence can be approximated with a low-order Taylor series. On the other hand, 
the series expansion filter uses a highly nonlinear transition function and one 
must consider higher order terms. 



6 Conclusion 

In this paper, we have presented a Gaussian filter based on the series expan- 
sion approximation. The novel contributions of this paper focus on improving 
the predictive distribution, so it is straightforward to construct a smoother us- 
ing similar methods: for example one can use the unscented smoother |6] or 



Gaussian smoother 31 directly. 

We have seen that the series expansion approximation enables us to ap- 
proximate the time-t marginal distribution of a diffusion process using a small 
number of auxilliary random variables Zi. Empirically, we observe that one 
needs far fewer auxilliary random variables than in standard time-discretised 
schemes, where one must introduce a new random variable for each timestep. 
As such, our method can be seen as a kind of dimensionality reduction scheme. 

The fact that one is working with a reduced number of auxilliary variables (in 
this paper, we needed tens rather than hundreds) means that one can fruitfully 
exploit existing numerical methods that otherwise might not scale sufficiently 
well. The computational requirements of the unscented transform grow linearly 
with the dimension of the filtering problem. Other sigma point methods, such 
as the Gauss-Hermite cubature rule, become exponentially more expensive as 
the number of basis functions grows and may not be as attractive. 

While the series expansion method allows us to avoid repeatedly project- 
ing the predictive distribution onto the space of Gaussian distributions (with 
concurrent loss of information), this does come at a price. One must exercise 
caution in picking tuning parameters for the sigma-point methods, since the 
transition function can be significantly nonlinear. 
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